Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninitialized Fingerprints in binary_fuse filters. #56

Merged
merged 1 commit into from
Jul 12, 2024

Conversation

SirTyson
Copy link
Contributor

In both versions of binary_fuse*_allocate, filter->Fingerprints is allocated, but uninitialized. During binary_fuse*_populate, this uninitialized valued is used when setting the filter:

filter->Fingerprints[h012[found]] = xor2 ^
                                        filter->Fingerprints[h012[found + 1]] ^
                                        filter->Fingerprints[h012[found + 2]];

This change 0 initializes filter->Fingerprints. I believe this is correct, as in "Binary Fuse Filters: Fast and Smaller Than Xor Filters," H is defined as a 0 initialized array in Algorithm 1.

@thomasmueller
Copy link
Contributor

I think you are right. The fingerprints are not set to zero (except when de-serializing from another source). This doesn't affect functionality, false positive rate etc, because it doesn't matter what data is stored initially in this array. But, I would also expect the array to be initialized to zero. This could be done the allocate method (as you have done), or later in the populate method. If done in allocate, then it will be overwritten in deserialize, which I think is not a problem.

So, I'm OK with this patch.

@lemire
Copy link
Member

lemire commented Jul 11, 2024

@SirTyson I chose malloc deliberately but I don't object to this PR. I expect that the overhead of calloc over malloc is small.

@SirTyson
Copy link
Contributor Author

While I agree that this does not affect false positive rate or correctness, I still think it is a helpful change. In particular, I'm using this library in a blockchain application, where the filter will be part of the ledger and must be bit consistent when constructed on separate nodes. Currently, even with the same initial seed, two nodes will produce different filters at the bit level. While the overall false positive rate is unchanged, I think (but am not an expert on the topic and could be wrong) that two filters constructed with the same initial seed will hit false positives on different keys, which is also not acceptable in a blockchain application.

@lemire lemire merged commit 3c0fd15 into FastFilter:master Jul 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants